OpenMP-oriented Applications for Distributed Shared Memory Architectures
نویسندگان
چکیده
The fast emergence of OpenMP as the preferable parallel programming paradigm for small-to-medium scale parallelism could decline unless OpenMP will show capabilities to be the model-of-choice for large scale high performance parallel computing of the next decade. The main stumbling block from adapting OpenMP for distributed shared memory (DSM) machines, which are based on architecture like cc-NUMA, stems from the absence of capabilities for data placement among processors and threads for achieving data locality. The absence of such mechanism causes remote memory accesses and inefficient cache memory use, both of which lead to poor performance. This paper presents a simple software programming approach called Copy-inside-Copy-back (CC) that exploits the privatization mechanism of OpenMP for data placement and re-placement. This technique enables one to distribute data without taking the control and the flexibility from the programmer, and thus, is an alternative to the traditional implicit and explicit approaches. Moreover, the CC approach enables SPMD style of programming that makes the development process of an OpenMP application more structured, and simply to modify and debug. The CC technique was tested and analyzed using the NAS Parallel Benchmarks on SGI Origin 2000 multiprocessors machine. The lesson learnt from this study shows that OpenMP can deliver the desired largescale parallelism although fast copy mechanism is essential.
منابع مشابه
Mpi+openmp Implementation of Memory-saving Parallel Pic Applications on Hierarchical Distributed-shared Memory Architectures
The combination of inter-node and intra-node domaindecomposition strategies for the development of memorysaving parallel Particle-in-cell simulation codes, targeted to hierarchical distributed-shared memory architectures, is discussed in this paper, along with its MPI+OpenMP implementation. Particular emphasis is given to the devised dynamic workload balancing technique.
متن کاملHierarchical MPI+OpenMP Implementation of Parallel PIC Applications on Clusters of Symmetric MultiProcessors
The hierarchical combination of decomposition strategies for the development of parallel Particle-in-cell simulation codes, targeted to hierarchical distributed-shared memory architectures, is discussed in this paper, along with its MPI+OpenMP implementation. Particular emphasis is given to the devised dynamic workload balancing technique.
متن کاملA Compile - Time Openmp Cost Model
OpenMP is a de facto API for parallel programming in C/C++ and Fortran on shared memory and distributed shared memory platforms. It is also being increasingly used with MPI to form a hybrid programming model and is expected to be a promising candidate to exploit emerging multicore architectures. An OpenMP cost model is an analytical model that reflects the characteristics of OpenMP applications...
متن کاملUTS: An Unbalanced Tree Search Benchmark
This paper presents an unbalanced tree search (UTS) benchmark designed to evaluate the performance and ease of programming for parallel applications requiring dynamic load balancing. We describe algorithms for building a variety of unbalanced search trees to simulate different forms of load imbalance. We created versions of UTS in two parallel languages, OpenMP and Unified Parallel C (UPC), usi...
متن کاملPorting and performance evaluation of irregular codes using OpenMP
In the last two years, OpenMP has been gaining popularity as a standard for developing portable shared memory parallel programs. With the improvements in centralized shared memory technologies and the emergence of distributed shared memory (DSM) architectures, several medium-to-large physical and logical shared memory con gurations are now available. Thus, OpenMP stands to be a promising medium...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002